Skip to content

Fixed scrapers#677

Merged
lalalaurentiu merged 4 commits intopeviitor-ro:mainfrom
lalalaurentiu:main
Feb 9, 2026
Merged

Fixed scrapers#677
lalalaurentiu merged 4 commits intopeviitor-ro:mainfrom
lalalaurentiu:main

Conversation

@lalalaurentiu
Copy link
Collaborator

This pull request updates several company-specific job scraper scripts to use more modern and robust APIs, improves consistency in request formatting, and simplifies the extraction logic. The changes primarily focus on switching to JSON-based POST requests, updating endpoints, and cleaning up or refactoring the parsing logic for job listings.

API and Endpoint Updates:

  • sites/atkinsrealis.py: Switched from scraping HTML pages to using a JSON-based POST API (https://slihrms.wd3.myworkdayjobs.com/wday/cxs/slihrms/Careers/jobs). The script now paginates using the API's offset and limit fields and extracts job data directly from the JSON response, simplifying the code and improving reliability.
  • sites/hcltechnologies.py: Migrated from scraping HTML to using the official recruiting API endpoint (https://careers.hcltech.com/services/recruiting/v1/jobs) with JSON payloads. The code now fetches and paginates jobs using the API, removing complex HTML parsing and city/county translation logic.
  • sites/hm.py: Updated the job search request to use a JSON POST with correct headers and a more targeted payload for Romania jobs, improving accuracy and reliability.

Request Formatting and Consistency:

  • sites/goodyear.py: Adjusted the order of location IDs in the post_data payload for consistency and corrected the logic for extracting the remote field from job data. [1] [2]
  • sites/hm.py: Added proper request headers (Content-Type and User-Agent) for the POST request to ensure compatibility with the API.

Code Simplification and Cleanup:

  • sites/atkinsrealis.py & sites/hcltechnologies.py: Removed unused imports and legacy code related to HTML parsing, city/county translation, and manual pagination, resulting in cleaner and more maintainable scripts. [1] [2]

These changes collectively modernize the scrapers, making them more robust against website changes and easier to maintain.

@lalalaurentiu lalalaurentiu merged commit cd3c31c into peviitor-ro:main Feb 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant